Goto

Collaborating Authors

 online dynamic programming


Online Dynamic Programming

Neural Information Processing Systems

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials. An instance of the type of problems we consider is to find a good binary search tree in a changing environment. At the beginning of each trial, the learner probabilistically chooses a tree with the n keys at the internal nodes and the n + 1 gaps between keys at the leaves. The learner is then told the frequencies of the keys and gaps and is charged by the average search cost for the chosen tree. The problem is online because the frequencies can change between trials. The goal is to develop algorithms with the property that their total average search cost (loss) in all trials is close to the total loss of the best tree chosen in hindsight for all trials. The challenge, of course, is that the algorithm has to deal with exponential number of trees. We develop a general methodology for tackling such problems for a wide class of dynamic programming algorithms. Our framework allows us to extend online learning algorithms like Hedge and Component Hedge to a significantly wider class of combinatorial objects than was possible before.


Reviews: Online Dynamic Programming

Neural Information Processing Systems

The aim of this paper is to extend results in online combinatorial optimization to new combinatorial objects such as k -multipaths and binary search trees. Specifically, the authors are interested in extending Expanded-Hedge (EH) and Component Hedge (CH) to online combinatorial games, for which the offline problem can be solved in a bottom-up way using Dynamic Programming. Though the results about online optimization with k -multipaths are interesting, the overall paper could benefit from more polishing: some definitions and notations are not clearly defined, and some assertions look incorrect. Namely: In Section 4, the set (of subsets) \mathcal H_v is not clearly defined, so it is very difficult for the reader to capture the family of DP tasks defined according to the recurrence relation (3). In this section, some concrete examples of online optimization problems characterized by this DP family would be helpful.


Online Dynamic Programming

Neural Information Processing Systems

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials. An instance of the type of problems we consider is to find a good binary search tree in a changing environment. At the beginning of each trial, the learner probabilistically chooses a tree with the n keys at the internal nodes and the n 1 gaps between keys at the leaves. The learner is then told the frequencies of the keys and gaps and is charged by the average search cost for the chosen tree. The problem is online because the frequencies can change between trials.